Lh* | a Scalable, Distributed Data Structure | Distributed Le Systems H.2.2 Database Management]: Physical Design | Access Methods H.2.2 Database Management]: Systems | Distributed Systems
نویسندگان
چکیده
We present a scalable distributed data structure called LH*. LH* generalizes Linear Hashing (LH) to distributed RAM and disk les. An LH* le can be created from records with primary keys, or objects with OIDs, provided by any number of distributed and autonomous clients. It does not require a central directory, and grows gracefully, through splits of one bucket at a time, to virtually any number of servers. The number of messages per random insertion is one in general, and three in the worst case, regardless of the le size. The number of messages per key search is two in general, and four in the worst case. The le supports parallel operations, e.g., hash joins and scans. Performing a parallel operation on a le ofM buckets costs at most 2M +1 messages, and between 1 and O(log2M) rounds of messages. We rst describe the basic LH* scheme where a coordinator site manages bucket splits, and splits a bucket every time a collision occurs. We show that the average load factor of an LH* le is 65{70% regardless of le size, and bucket capacity. We then enhance the scheme with load control, performed at no additional message cost. The average load factor then increases to 80{95%. These values are about that of LH, but the load factor for LH* varies more. We next de ne LH* schemes without a coordinator. We show that insert and search costs are the same as for the basic scheme. The splitting cost decreases on the average, but becomes more variable, as cascading splits are needed to prevent le overload. Next, we brie y describe two variants of splitting policy, using parallel splits and presplitting that should enhance performance for high-performance applications. All together, we show that LH* les can e ciently scale to les that are orders of magnitude larger in size that single-site les. LH* les that reside in main memory may also be much faster than single-site disk les. Finally, LH* les can be more e cient than any distributed le with a centralized directory, or a static parallel or distributed hash le.
منابع مشابه
Scalable Storage for a DBMS using Transparent Distribution
Scalable Distributed Data Structures (SDDSs) provide a self-managing and self-organizing data storage of potentially unbounded size. This stands in contrast to common distribution schemas deployed in conventional distributed DBMS. SDDSs, however, have mostly been used in synthetic scenarios to investigate their properties. In this paper we concentrate on the integration of the LH* SDDS into our...
متن کاملLh*lh: a Scalable High Performance Data Structure for Switched Multicomputers Lh*lh: a Scalable High Performance Data Structure for Switched Multicomputers
LH lh is a new data structure for scalable high performance hash les on the increasingly popular switchedmulticomputers i e MIMDmulti processor machines with distributed RAM memory and without shared memory An LH lh le scales up gracefully over available processors and the distributed memory easily reaching Gbytes Address calcu lus does not require any centralized component that could lead to a...
متن کاملScalable Storage for a Dbms Using Transparent Distribution Scalable Storage for a Dbms Using Transparent Distribution
Scalable Distributed Data Structures (SDDSs) provide a self-managing and self-organizing data storage of potentially unbounded size. This stands in contrast to common distribution schemas deployed in conventional distributed DBMS. SDDSs, however, have mostly been used in synthetic scenarios to investigate their properties. In this paper we concentrate on the integration of the LH* SDDS into our...
متن کاملhQT*: A Scalable Distributed Data Structure for High-Performance Spatial Accesses
Spatial data storage stresses the capability of conventional DBMSs. We present a scalable distributed data structure, hQT*, which ooers support for eecient spatial point and range queries using order preserving hashing. It is designed to deal with skewed data and extends results obtained with scal-able distributed hash les, LH*, and other hash-ing schemas. Performance analysis shows that an hQT...
متن کاملO-Storage: A Self Organizing Multi-Attribute Storage Technique for Very Large Main Memories
Main memory is continuously improving both in price and capacity. With this comes new storage problems as well as new directions of usage. Just before the millennium, several main memory database systems are becoming commercially available. The hot areas include boosting the performance of web-enabled systems, such as search-engines, and auctioning systems. We present a novel data storage struc...
متن کامل